54 research outputs found

    Kernel Distribution Embeddings: Universal Kernels, Characteristic Kernels and Kernel Metrics on Distributions

    Full text link
    Kernel mean embeddings have recently attracted the attention of the machine learning community. They map measures μ\mu from some set MM to functions in a reproducing kernel Hilbert space (RKHS) with kernel kk. The RKHS distance of two mapped measures is a semi-metric dkd_k over MM. We study three questions. (I) For a given kernel, what sets MM can be embedded? (II) When is the embedding injective over MM (in which case dkd_k is a metric)? (III) How does the dkd_k-induced topology compare to other topologies on MM? The existing machine learning literature has addressed these questions in cases where MM is (a subset of) the finite regular Borel measures. We unify, improve and generalise those results. Our approach naturally leads to continuous and possibly even injective embeddings of (Schwartz-) distributions, i.e., generalised measures, but the reader is free to focus on measures only. In particular, we systemise and extend various (partly known) equivalences between different notions of universal, characteristic and strictly positive definite kernels, and show that on an underlying locally compact Hausdorff space, dkd_k metrises the weak convergence of probability measures if and only if kk is continuous and characteristic.Comment: Old and longer version of the JMLR paper with same title (published 2018). Please start with the JMLR version. 55 pages (33 pages main text, 22 pages appendix), 2 tables, 1 figure (in appendix

    Distribution-Dissimilarities in Machine Learning

    Get PDF
    Any binary classifier (or score-function) can be used to define a dissimilarity between two distributions. Many well-known distribution-dissimilarities are actually classifier-based: total variation, KL- or JS-divergence, Hellinger distance, etc. And many recent popular generative modeling algorithms compute or approximate these distribution-dissimilarities by explicitly training a classifier: e.g. generative adversarial networks (GAN) and their variants. This thesis introduces and studies such classifier-based distribution-dissimilarities. After a general introduction, the first part analyzes the influence of the classifiers' capacity on the dissimilarity's strength for the special case of maximum mean discrepancies (MMD) and provides applications. The second part studies applications of classifier-based distribution-dissimilarities in the context of generative modeling and presents two new algorithms: Wasserstein Auto-Encoders (WAE) and AdaGAN. The third and final part focuses on adversarial examples, i.e. targeted but imperceptible input-perturbations that lead to drastically different predictions of an artificial classifier. It shows that adversarial vulnerability of neural network based classifiers typically increases with the input-dimension, independently of the network topology

    PopSkipJump: Decision-Based Attack for Probabilistic Classifiers

    Full text link
    Most current classifiers are vulnerable to adversarial examples, small input perturbations that change the classification output. Many existing attack algorithms cover various settings, from white-box to black-box classifiers, but typically assume that the answers are deterministic and often fail when they are not. We therefore propose a new adversarial decision-based attack specifically designed for classifiers with probabilistic outputs. It is based on the HopSkipJump attack by Chen et al. (2019, arXiv:1904.02144v5 ), a strong and query efficient decision-based attack originally designed for deterministic classifiers. Our P(robabilisticH)opSkipJump attack adapts its amount of queries to maintain HopSkipJump's original output quality across various noise levels, while converging to its query efficiency as the noise level decreases. We test our attack on various noise models, including state-of-the-art off-the-shelf randomized defenses, and show that they offer almost no extra robustness to decision-based attacks. Code is available at https://github.com/cjsg/PopSkipJump .Comment: ICML'21. Code available at https://github.com/cjsg/PopSkipJump . 9 pages & 7 figures in main part, 14 pages & 10 figures in appendi

    Removing systematic errors for exoplanet search via latent causes

    Full text link
    We describe a method for removing the effect of confounders in order to reconstruct a latent quantity of interest. The method, referred to as half-sibling regression, is inspired by recent work in causal inference using additive noise models. We provide a theoretical justification and illustrate the potential of the method in a challenging astronomy application.Comment: Extended version of a paper appearing in the Proceedings of the 32nd International Conference on Machine Learning, Lille, France, 201

    Assaying Out-Of-Distribution Generalization in Transfer Learning

    Full text link
    Since out-of-distribution generalization is a generally ill-posed problem, various proxy targets (e.g., calibration, adversarial robustness, algorithmic corruptions, invariance across shifts) were studied across different research programs resulting in different recommendations. While sharing the same aspirational goal, these approaches have never been tested under the same experimental conditions on real data. In this paper, we take a unified view of previous work, highlighting message discrepancies that we address empirically, and providing recommendations on how to measure the robustness of a model and how to improve it. To this end, we collect 172 publicly available dataset pairs for training and out-of-distribution evaluation of accuracy, calibration error, adversarial attacks, environment invariance, and synthetic corruptions. We fine-tune over 31k networks, from nine different architectures in the many- and few-shot setting. Our findings confirm that in- and out-of-distribution accuracies tend to increase jointly, but show that their relation is largely dataset-dependent, and in general more nuanced and more complex than posited by previous, smaller scale studies

    Object-Centric Multiple Object Tracking

    Full text link
    Unsupervised object-centric learning methods allow the partitioning of scenes into entities without additional localization information and are excellent candidates for reducing the annotation burden of multiple-object tracking (MOT) pipelines. Unfortunately, they lack two key properties: objects are often split into parts and are not consistently tracked over time. In fact, state-of-the-art models achieve pixel-level accuracy and temporal consistency by relying on supervised object detection with additional ID labels for the association through time. This paper proposes a video object-centric model for MOT. It consists of an index-merge module that adapts the object-centric slots into detection outputs and an object memory module that builds complete object prototypes to handle occlusions. Benefited from object-centric learning, we only require sparse detection labels (0%-6.25%) for object localization and feature binding. Relying on our self-supervised Expectation-Maximization-inspired loss for object association, our approach requires no ID labels. Our experiments significantly narrow the gap between the existing object-centric model and the fully supervised state-of-the-art and outperform several unsupervised trackers.Comment: ICCV 2023 camera-ready versio

    Search for dark matter produced in association with bottom or top quarks in √s = 13 TeV pp collisions with the ATLAS detector

    Get PDF
    A search for weakly interacting massive particle dark matter produced in association with bottom or top quarks is presented. Final states containing third-generation quarks and miss- ing transverse momentum are considered. The analysis uses 36.1 fb−1 of proton–proton collision data recorded by the ATLAS experiment at √s = 13 TeV in 2015 and 2016. No significant excess of events above the estimated backgrounds is observed. The results are in- terpreted in the framework of simplified models of spin-0 dark-matter mediators. For colour- neutral spin-0 mediators produced in association with top quarks and decaying into a pair of dark-matter particles, mediator masses below 50 GeV are excluded assuming a dark-matter candidate mass of 1 GeV and unitary couplings. For scalar and pseudoscalar mediators produced in association with bottom quarks, the search sets limits on the production cross- section of 300 times the predicted rate for mediators with masses between 10 and 50 GeV and assuming a dark-matter mass of 1 GeV and unitary coupling. Constraints on colour- charged scalar simplified models are also presented. Assuming a dark-matter particle mass of 35 GeV, mediator particles with mass below 1.1 TeV are excluded for couplings yielding a dark-matter relic density consistent with measurements
    corecore